A machine-learning approach reveals that alignment properties alone can accurately predict inference of lateral gene transfer from discordant phylogenies.
نویسندگان
چکیده
Among the methods currently used in phylogenomic practice to detect the presence of lateral gene transfer (LGT), one of the most frequently employed is the comparison of gene tree topologies for different genes. In cases where the phylogenies for different genes are incompatible, or discordant, for well-supported branches there are three simple interpretations for the result: 1) gene duplications (paralogy) followed by many independent gene losses have occurred, 2) LGT has occurred, or 3) the phylogeny is well supported but for reasons unknown is nonetheless incorrect. Here, we focus on the third possibility by examining the properties of 22,437 published multiple sequence alignments, the Bayesian maximum likelihood trees for which either do or do not suggest the occurrence of LGT by the criterion of discordant branches. The alignments that produce discordant phylogenies differ significantly in several salient alignment properties from those that do not. Using a support vector machine, we were able to predict the inference of discordant tree topologies with up to 80% accuracy from alignment properties alone.
منابع مشابه
Image Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملThe impact of reticulate evolution on genome phylogeny.
Genome phylogenies are used to build tree-like representations of evolutionary relationships among genomes. However, in condensing the phylogenetic signals within a set of genomes down to a single tree, these methods generally do not explicitly take into account discordant signals arising due to lateral genetic transfer. Because conflicting vertical and horizontal signals can produce compromise...
متن کاملImage alignment via kernelized feature learning
Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...
متن کاملPhyloGenie: automated phylome generation and analysis.
Phylogenetic reconstruction is the method of choice to determine the homologous relationships between sequences. Difficulties in producing high-quality alignments, which are the basis of good trees, and in automating the analysis of trees have unfortunately limited the use of phylogenetic reconstruction methods to individual genes or gene families. Due to the large number of sequences involved,...
متن کاملGenome phylogenies indicate a meaningful alpha-proteobacterial phylogeny and support a grouping of the mitochondria with the Rickettsiales.
Placement of the mitochondrial branch on the tree of life has been problematic. Sparse sampling, the uncertainty of how lateral gene transfer might overwrite phylogenetic signals, and the uncertainty of phylogenetic inference have all contributed to the issue. Here we address this issue using a supertree approach and completed genomic sequences. We first determine that a sensible alpha-proteoba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 26 9 شماره
صفحات -
تاریخ انتشار 2009